Autotuning of Pattern Runtimes for Accelerated Parallel Systems
نویسندگان
چکیده
Parallel architectures with node-level accelerators promise significant performance improvements over conventional homogeneous systems. To cope with the increased complexity of programming such systems various pattern-based programming libraries have become available. In this paper we present our work on providing autotuning capabilities for two runtime libraries that provide parallel programming patterns on state-of-the-art heterogeneous hardware. We present a brief overview of these runtime libraries, outline possible integration with existing tuning frameworks and present initial experimental results.
منابع مشابه
pOSKI: An Extensible Autotuning Framework to Perform Optimized SpMVs on Multicore Architectures
We have developed pOSKI: the Parallel Optimized Sparse Kernel Interface – an autotuning framework to optimize Sparse Matrix Vector Multiply (SpMV) performance on emerging shared memory multicore architectures. Our autotuning methodology extends previous work done in the scientific computing community targeting serial architectures. In addition to previously explored parallel optimizations, we f...
متن کاملCompiler-based code generation and autotuning for geometric multigrid on GPU-accelerated supercomputers
GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model like CUDA, OpenCL, or OpenACC. As such, in order to deliver portability across CPU-based and GPU-accelerated supercomputers, programmers are forced to write and ma...
متن کاملApplication-independent Autotuning for GPUs
Autotuning is an established technique for adjusting performance-critical parameters of applications to their specific run-time environment. In this paper, we investigate the potential of online autotuning for general purpose computation on GPUs. Our application-independent autotuner AtuneRT optimizes GPU-specific parameters such as block size and loop-unrolling degree. We also discuss the pecu...
متن کاملStatement of Research
History has shown the benefits of high-level languages, language design, and managed language runtimes on how programmers develop complex and sophisticated systems. High-level languages, such as Java and Standard ML, are strongly typed and provide rich abstraction mechanisms, thereby reducing the time and effort to develop software. Language primitives and abstractions provide semantic guarante...
متن کاملMSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems
MSAProbs is a state-of-the-art protein multiple sequence alignment tool based on hidden Markov models. It can achieve high alignment accuracy at the expense of relatively long runtimes for large-scale input datasets. In this work we present MSAProbs-MPI, a distributed-memory parallel version of the multithreaded MSAProbs tool that is able to reduce runtimes by exploiting the compute capabilitie...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013